Cairo
I don't see images in my head. Can training give me a mind's eye?
I don't see images in my head. Can training give me a mind's eye? Training programmes for people with aphantasia - the inability to create mental images - are challenging neuroscientists' understanding of how we create thoughts What do you see when you try to picture an apple? Last December, I closed my eyes and tried to visualise a potoo. This tropical bird has a "round, kind of pill-shaped head", my mental imagery coach described to me, and is covered with brown feathers. Its cartoonishly large mouth opens like a gaping smile to reveal a pink, fleshy colour, and its large irises can make its eyes seem entirely black.
- Europe > Ukraine > Kyiv Oblast > Chernobyl (0.05)
- Oceania > Australia > New South Wales (0.04)
- Europe > United Kingdom > England > Merseyside > Liverpool (0.04)
- (4 more...)
- Health & Medicine > Therapeutic Area > Neurology (1.00)
- Health & Medicine > Therapeutic Area > Psychiatry/Psychology (0.95)
Test-Time Scaling Makes Overtraining Compute-Optimal
Roberts, Nicholas, Cho, Sungjun, Gao, Zhiqi, Huang, Tzu-Heng, Wu, Albert, Orlanski, Gabriel, Trost, Avi, Buchanan, Kelly, Albarghouthi, Aws, Sala, Frederic
Modern LLMs scale at test-time, e.g. via repeated sampling, where inference cost grows with model size and the number of samples. This creates a trade-off that pretraining scaling laws, such as Chinchilla, do not address. We present Train-to-Test ($T^2$) scaling laws that jointly optimize model size, training tokens, and number of inference samples under fixed end-to-end budgets. $T^2$ modernizes pretraining scaling laws with pass@$k$ modeling used for test-time scaling, then jointly optimizes pretraining and test-time decisions. Forecasts from $T^2$ are robust over distinct modeling approaches: measuring joint scaling effect on the task loss and modeling impact on task accuracy. Across eight downstream tasks, we find that when accounting for inference cost, optimal pretraining decisions shift radically into the overtraining regime, well-outside of the range of standard pretraining scaling suites. We validate our results by pretraining heavily overtrained models in the optimal region that $T^2$ scaling forecasts, confirming their substantially stronger performance compared to pretraining scaling alone. Finally, as frontier LLMs are post-trained, we show that our findings survive the post-training stage, making $T^2$ scaling meaningful in modern deployments.
- Asia > Middle East > Jordan (0.04)
- Africa > Middle East > Egypt > Cairo Governorate > Cairo (0.04)
Generalized Discrete Diffusion from Snapshots
Zekri, Oussama, Uscidda, Théo, Boullé, Nicolas, Korba, Anna
We introduce Generalized Discrete Diffusion from Snapshots (GDDS), a unified framework for discrete diffusion modeling that supports arbitrary noising processes over large discrete state spaces. Our formulation encompasses all existing discrete diffusion approaches, while allowing significantly greater flexibility in the choice of corruption dynamics. The forward noising process relies on uniformization and enables fast arbitrary corruption. For the reverse process, we derive a simple evidence lower bound (ELBO) based on snapshot latents, instead of the entire noising path, that allows efficient training of standard generative modeling architectures with clear probabilistic interpretation. Our experiments on large-vocabulary discrete generation tasks suggest that the proposed framework outperforms existing discrete diffusion methods in terms of training efficiency and generation quality, and beats autoregressive models for the first time at this scale. We provide the code along with a blog post on the project page : \href{https://oussamazekri.fr/gdds}{https://oussamazekri.fr/gdds}.
- Asia > Middle East > Saudi Arabia (0.04)
- Asia > Middle East > Syria (0.04)
- North America > United States > Illinois (0.04)
- (11 more...)
- Information Technology > Communications > Social Media (1.00)
- Information Technology > Artificial Intelligence > Representation & Reasoning (0.87)
- Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.68)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.68)
The original tippex! Ancient Egyptians used white pigments to amend their paintings 3,000 years ago, study finds
Kentucky mother and daughter turn down $26.5MILLION to sell their farms to secretive tech giant that wants to build data center there Horrifying next twist in the Alexander brothers case: MAUREEN CALLAHAN exposes an unthinkable perversion that's been hiding in plain sight Hollywood icon who starred in Psycho after Hitchcock dubbed her'my new Grace Kelly' looks incredible at 95 Kylie Jenner's total humiliation in Hollywood: Derogatory rumor leaves her boyfriend's peers'laughing at her' behind her back Tucker Carlson erupts at Trump adviser as she hurls'SLANDER' claim linking him to synagogue shooting Ben Affleck'scores $600m deal' with Netflix to sell his AI film start-up Long hair over 45 is ageing and try-hard. I've finally cut mine off. Alexander brothers' alleged HIGH SCHOOL rape video: Classmates speak out on sickening footage... as creepy unseen photos are exposed Heartbreaking video shows very elderly DoorDash driver shuffle down customer's driveway with coffee order because he is too poor to retire Amber Valletta, 52, was a '90s Vogue model who made movies with Sandra Bullock and Kate Hudson, see her now Model Cindy Crawford, 60, mocked for her'out of touch' morning routine: 'Nothing about this is normal' Before typos could be deleted with the press of a button, careless writers had to resort to sticky tubes of white Tippex to hide their errors. But archaeologists now say that clumsy scribes have been resorting to white-out for at least 3,000 years. Researchers from the Fitzwilliam Museum in Cambridge found that the Ancient Egyptians used a white pigment to amend their papyrus paintings.
- North America > United States > Kentucky (0.24)
- Europe > Middle East > Malta > Port Region > Southern Harbour District > Valletta (0.24)
- North America > Canada > Alberta (0.14)
- (16 more...)
- Media > Television (1.00)
- Media > Music (1.00)
- Media > Film (1.00)
- (6 more...)
- Information Technology > Communications > Social Media (1.00)
- Information Technology > Artificial Intelligence (1.00)
- Information Technology > Communications > Mobile (0.69)
- Europe > Switzerland > Zürich > Zürich (0.04)
- North America > United States > Texas (0.04)
- Europe > Norway > Western Norway > Vestland > Bergen (0.04)
- (2 more...)
- Leisure & Entertainment > Games (1.00)
- Transportation > Ground > Rail (0.45)
- Asia > Japan > Honshū > Kantō > Tokyo Metropolis Prefecture > Tokyo (0.15)
- Europe > Germany > Brandenburg > Potsdam (0.04)
- Asia > China > Shanghai > Shanghai (0.04)
- (9 more...)
- Workflow (1.00)
- Research Report > New Finding (0.46)
- Law (1.00)
- Government (0.93)
- Information Technology (0.69)
- Asia > Japan > Honshū > Kantō > Tokyo Metropolis Prefecture > Tokyo (0.14)
- Europe > Germany > Brandenburg > Potsdam (0.04)
- Asia > Middle East > Jordan (0.04)
- (13 more...)
- Workflow (0.93)
- Research Report > Experimental Study (0.93)
- Research Report > New Finding (0.67)
- Europe > Ukraine > Kyiv Oblast > Kyiv (0.14)
- Europe > Austria > Vienna (0.14)
- Asia > Japan > Honshū > Kantō > Tokyo Metropolis Prefecture > Tokyo (0.14)
- (96 more...)
- Research Report > New Finding (1.00)
- Research Report > Experimental Study (1.00)
- Education > Health & Safety > School Nutrition (0.93)
- Health & Medicine > Consumer Health (0.93)
- Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
- Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
- Information Technology > Artificial Intelligence > Natural Language > Chatbot (0.73)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.52)
Language Model Tokenizers Introduce Unfairness Between Languages
Recent language models have shown impressive multilingual performance, even when not explicitly trained for it. Despite this, there are concerns about the quality of their outputs across different languages. In this paper, we show how disparity in the treatment of different languages arises at the tokenization stage, well before a model is even invoked. The same text translated into different languages can have drastically different tok-enization lengths, with differences up to 15 times in some cases. These disparities persist even for tokenizers that are intentionally trained for multilingual support.
- North America > Haiti (0.14)
- Asia > Philippines > Luzon > Ilocos Region > Province of Pangasinan (0.04)
- Europe > Switzerland > Zürich > Zürich (0.04)
- (38 more...)
- Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.70)
- Information Technology > Artificial Intelligence > Natural Language > Chatbot (0.69)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.69)
- Information Technology > Artificial Intelligence > Natural Language > Machine Translation (0.68)
- North America > United States > California > Los Angeles County > Los Angeles (0.14)
- South America > Brazil > Rio de Janeiro > Rio de Janeiro (0.04)
- North America > United States > Nevada > Clark County > Las Vegas (0.04)
- (8 more...)
- Transportation > Ground > Road (0.68)
- Information Technology > Security & Privacy (0.67)
- Leisure & Entertainment > Games > Computer Games (0.45)